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Abstract — The achievable information rate of finite-state input 
two-dimensional (2-D) channels with memory is an open prob- 
lem, which is relevant, e.g., for inter-symbol-interference (ISI) 
channels and cellular multiple-access channels. We propose a 
method for simulation-based computation of such information 
rates. We first draw a connection between the Shannon-theoretic 
information rate and the statistical mechanics notion of free 
energy. Since the free energy of such systems is intractable, we 
approximate it using the cluster variation method, implemented 
via generalized belief propagation. The derived, fully tractable, 
algorithm is shown to provide a practically accurate estimate of 
the information rate. In our experimental study we calculate the 
information rates of 2-D ISI channels and of hexagonal Wyner 
cellular networks with binary inputs, for which formerly only 
bounds were known. 

Submitted to ISIT 2005 

I. INTRODUCTION 

Two-dimensional (2-D) finite-state input channels with 
memory exhibit an important class of channels, which appears 
extensively in a wide range of fields. For example, in inter- 
symbol interference (ISI) channels, which are applicable to 
magnetic and optical recording devices, finite-state symbols 
are ordered on a 2-D grid, causing interference in a limited 
neighborhood. 

A second example concerns multiple-access channels in 
cellular networks. In a seminal work [1], Wyner has intro- 
duced a simple, yet insightful, analytically solvable model 
for cellular networks with Gaussian signaling, thus yielding 
a considerable insight into the ultimate information-theoretic 
limits of realistic cellular networks. In addition to a naive one- 
dimensional (1-D) extension of a single cell system, Wyner has 
also analyzed the traditional 2-D hexagonal topology, where 
interference is caused by neighboring cellular tiers. Hence, in 
case of binary signaling Wyner's model can be viewed as an 
instance of a finite-state input dispersive channel. 

The capacity of finite-state input dispersive channels is 
defined as the maximum mutual information rate over all 
input distributions. Computing this capacity for 1-D and 
2-D channels is an open problem. Calculating the mutual 
information rate in the case of a predefined stationary input 
distribution is, in principle, a simpler problem. For example, 
for input symbols which are i.i.d. and equiprobable, this is 
termed the symmetric information rate (SIR), thus providing 



a limit on the achievable rate of reliable communication in 
this common case. 

Various bounds, either rigorous [2]-[4], numerical [5], [6] or 
conjectured [7], on the capacity and SIR of certain finite-state 
input 1-D dispersive channels have been proposed. Recently 
several authors introduced simulation based methodologies 
for computing such information rates ( [8] and references 
therein). In this approach, the forward recursion of the sum- 
product (BCJR) algorithm [9] is used for estimating the a- 
posteriori probability (APP) and consequently deriving the 1- 
D information rates. As for 2-D channels, due to their inherent 
complexity, only upper and lower bounds on the information 
rate are known [10]. 

In this paper we propose a simulation-based method for 
estimating the information rate of 2-D channels. This method 
can be viewed as an extension of its 1-D Monte-Carlo counter- 
part [8], where a fully tractable generalized belief propagation 
(GBP) receiver replaces the sum-product algorithm as an APP 
inference engine. 

In a former work [11] we have shown that a GBP receiver 
serves excellently well as an APP detector of dispersive 2-D 
channels 1 . In this work we utilize another aspect of GBP, i.e., 
its remarkable ability to approximate the free energy of 2-D 
channels, as we draw the connection between the information 
rate and the free energy [12]. 

The paper is organized as follows. Section |ll] introduces 
the dispersive 2-D channel model, while section derives 
the form of the information rate and draws its connection 
to the free energy. Since the free energy of 2-D channels is 
intractable, section HVl presents a method for approximating it, 
which is then applied in the context of probabilistic graphical 
models in section [V] Section IVII evaluates the quality of the 
free energy approximation, as compared to its exact value. 
Next, simulation results for the information rate of a 2-D 
ISI channel and an hexagonal Wyner cellular network are 
provided. The results are discussed in section fyTTl 

We shall use the following notations. The operator { } T 
stands for a vector or matrix transpose, and denote 
entries of a vector and matrix, respectively. 

'A detector which is based on standard belief propagation often fails to 
converge in 2-D channels. 



II. CHANNEL MODEL 

Consider a N x N 2-D finite-state input channel with 
memory in the form 



•H 



yk.i = d k j + v k ,i + 

(i,j)€(k,l) 



Vk,l = l,...,N, (1) 



where yk.i, the channel's output observation at symbol (k, I) G 
Z 2 , is the sum of the finite-state alphabet input symbol cfjy, 
assumed to be taken from a stationary process, and two 
additional terms. The first term Vk.i represents ambient additive 
white Gaussian noise (AWGN), while the second term is 
the scaled interference caused by adjacent symbols to (k,l), 
denoted by (k, I), The parameter otij < 1) controls 

the interference attenuation. The interference term is assumed 
to be spatially invariant (excluding boundary symbols), which 
together with assumptions regarding di j and Vij guaranties 
that yk,i (k,l = 1, ...,N) are stationary random variables. 
We also assume that the channel is perfectly known on the 
receiver's side, which can jointly process all observations. 

Stacking all the observations, data symbols and noise sam- 
ples into TV 2 x 1 vectors y, d and v, respectively, Q can be 
rewritten as 



Sd 



(2) 



where the N 2 x iV 2 matrix S encapsulates the mem- 
ory/interference structure. Each 2-D channel is uniquely de- 
fined by its interference matrix S. Our basic assumption, 
which later allows for a graphical model interpretation, is 
that interference is caused by neighboring symbols, i.e., S 
is a relatively sparse matrix. The upper pane in Fig. ^ 
represents the interference structure of two topologies: ISI (a) 
and an hexagonal Wyner cellular network (b). In the following 
derivations we assume real-space data signaling d, interference 
S and noise v ~ W(0, <j 2 In) (an extension to the complex 
domain is straightforward.) 

III. Information rate 

A. Basic Definitions 

The information rate, i.e. mutual information per symbol, 
between the channel's input X and output y is, 



i{x-y) = h(y)-h(y\x), 



(3) 



where h(-) are (differential) entropy rates, where, by defi- 
nition, the entropy rate h(Q) of a stationary process q = 
{qi, . . . , <?l} T is given by lim^^co h(q)/L. Let us deal sep- 
arately with the two terms in Q- 

The second term, h(y\X), is given by limjv^oo h(y\x.) / N 2 , 
but since h(y\x) = h(v), and v is AWGN, it is straightforward 
to validate that h(y\X) = (\og2irea 2 ) /2. 

In order to calculate h(y) we apply the Shannon-McMillan- 
Breiman theorem [13] 2 , which states that for a stationary and 
ergodic channel the entropy rate can be calculated by 

- logp(y) N -^° h{y) with probability 1, (4) 

2 The theorem also applies to continuous random variables [14]. 
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Fig. 1. Upper pane: Interference structures for two types of 3 X 3 2- 
D channels: (a) ISI grid, (b) hexagonal Wyner cellular network. The arrows 
mark the direction of interference. Lower pane: The corresponding undirected 
graphical model representation of the channels in the upper pane: (c) ISI 
grid, (d) hexagonal Wyner cellular network. Full nodes represent (hidden) 
transmitted bits, while empty nodes correspond to the observations. Interaction 
couplings (compatibility function) ipij are denoted by a solid line connecting 
two full nodes, while the external field potential (evidence) (pi is depicted by 
a solid line connecting a full node and an empty node. For clarity we use 
dotted edges in (d) to represent the extra edges added compared to the graph 
(c). 



where p(y) is the joint distribution of the channel's output y. 
Hence, in order to calculate the information rate one needs to 
calculate p(y) in the limit of large systems, as described in 
the next section. 

B. The Connection to Free Energy 

Using Bayes' law p(y) can be rewritten as 



p(y) 



Vy|x) Pr(x) 



p(v) Pr(x) 



(5) 



where ^ x corresponds to a sum over all the possible values 
of the transmitted symbols d. Hereinafter, for exposition 
purposes we consider the case of equiprobable i.i.d. binary- 
input alphabet, i.e., d t £ ±1. Hence, using the distribution of 
v, can be rewritten as 



p(y) = Z ■ (2C)- N2 
where C = {2i:a 2 ) 1 / 2 , and 



Z = ^ exp 



-^5lly-sx| 



is the partition function. 

Inserting into @, the I(X; y) can be written as 



log2-l/2 + ^ 
where 



I(X-,y) with probability 1, 
1 



N 2 



log(Z) 



(6) 



(7) 



(8) 



(9) 



is recognized as the free energy per symbol [12], [15]. Hence, 
the problem of calculating the information rate boils down to 
estimating the free energy of an infinite system, as discussed 



in the next section. The information rate in l|8} is termed 
the symmetric (a.k.a. uniform-input) information rate (SIR), 
due to the assumption regarding the uniformity of the input 
symbols. Similar analysis also holds for other stationary finite- 
state input distributions. 

IV. The Free Energy and the Cluster Variation 
Method 

The free energy is a fundamental quantity in statistical me- 
chanics which the physics literature has devoted a considerable 
effort in calculating. However, evaluating the free energy of 
infinitely large 2-D channels such as {0 is infeasible and, one 
must resort to approximate methods 3 . 

One of the classic approximation methods of free energies is 
the Kikuchi approximation, also known as the cluster variation 
method (CVM, [16]). The difficulty in exactly calculating the 
free energy results from the intractability of the probability 
distribution p(y). Hence, the CVM follows a variational 
principle: It defines the free energy as a functional of this 
probability distribution, J 7 (p(y)), replaces p(y) by a tractable 
trial belief vector b(y) 4 , then minimizes JT(6(y)) w.r.t b(y), 
and considers the minimal value as its approximation to the 
free energy. Hence, our idea for estimating the information 
rate is to use the CVM over a large enough, yet finite, system, 
as the computed free energy per symbol is conjectured to 
converge to its exact value for infinite systems. This idea is 
empirically validated in section IVT1 

Recently, Yedidia et al. [16] have proved a correspondence 
between the stationary points of the CVM-based free energy 
and the fixed points of a message passing algorithm from the 
field of graphical models termed generalized belief propa- 
gation (GBP). GBP is an extension of the celebrated belief 
propagation algorithm (BP), that has been shown to provide 
better approximations than BP. Note, in passing, that Yedidia 
et al., have also shown that the fixed points of BP correspond 
to the stationary points of the Bethe free energy, which is a 
special case of the CVM, in the same way that BP is a special 
case of GBP. For an elaborate discussion of both CVM and 
GBP see [16]. In the following section we describe the channel 
from the perspective of graphical models, and shortly describe 
our application of the GBP algorithm. 

V. The connection to undirected graphical 

MODELS 

An undirected graphical model with pairwise potentials 
(a.k.a. pairwise Markov random fields), consists of a graph G 
and potential (compatibility) functions ipij(£i,Xj) and 4>i(£i) 

3 Our system corresponds to a random field 2-D Ising system, for which an 
analytical solution is not available [15]. 

4 The trial belief vector 6(y) = IIagm P(y\) Cx > where A is a 'cluster' of 
neighboring symbols y\, taken from the set 'clusters' M. The integers c\, 
a.k.a. counting numbers, are provided by the CVM in order to ascertain that 
each symbol is counted exactly once in the corresponding free energy. Since 
b(y) depends only on local marginal probabilities, p(y\), it is tractable. 
However, 6(y) need not necessarily form a valid probability distribution 
function [16]. 



such that the probability of an assignment x is given by 

Pr(x) oc Y\ II ( > 10 ' ) 

(*>i) 1 

The notation (i > j) represents the set of all connected pairs 

{x% , Xj ) . 

The joint posterior probability of the channel can be written 

as 

Pr(x|y) =Z- 1 cxp(-^||y-Sx|| 2 ). (11) 
Hence il It defines the undirected graphical model 

Pr(x|y) oc U ipij(xi,Xj)Y\_^i(xi,hi), (12) 

(i>j) » 

where 

ipijixijXj) = exp( ' 3X * 3 ) (13) 

is a compatibility function representing the structure of the 
system and the potential 

4>i(xi,yi) = exp (-^-) (14) 

is the 'evidence' or local likelihood, which describes the 
statistical dependency between the hidden variable x.- L and the 
observed variable hi 5 . The matrix R — S T S is the interference 
cross-correlation matrix and h = S T y is the output vector of 
a filter matched to the channel's interference structure. The 
lower pane in Fig. nP resents the resulting graphical models 
of the two channel examples considered in this work. 

A. Generalized Belief Propagation 

The GBP algorithm is an extension of BP that has been 
shown to provide better approximations in many applications. 
The first step in applying GBP to a graph dlOi is to define 
regions (clusters) of nodes which may intersect, and then pass 
messages between these regions in an analogous way to BP. 
Within each such region GBP performs exact inference, thus 
short cycles of nodes which are included in a region cause no 
problem. 

Hence, a region that encompasses all nodes along the 
shortest cycles, might be a desired choice. Since the graphical 
models of our 2-D channel examples contain interactions 
between nearest neighbors and next nearest neighbors, as 
displayed in Fig.[2-(c,d), a natural choice of regions is a sliding 
3x3 square of nodes (e.g., see Fig[2j. In all of our simulations 
the selected GBP regions were of size 3x3. Surprisingly, the 
computations required for GBP are only slightly larger than 
the computations required for BP, and its complexity grows 
exponentially only with the size of the chosen regions. 

5 Notice that for the non-binary finite-state input alphabet case, the Markov 
random fields modelling is identical, except for an additional external field 
potential operating on each node which can be absorbed into <j>i term. This 
additional potential arises from the auto-correlations Ra, which can not be 
dropped out from the sufficient statistics expression as in the binary case. 
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Fig. 2. Covering a 4 X 4 2-D channel by 
3x3 regions used in GBP. Regions are 
defined by sliding a 3 X 3 window along 
the channel. The result is four regions for 
this 4x4 system: 
{1,2,3,5,6,7,9, 10,11}, 
{2,3,4,6,7,8, 10,11,12}, 
{5,6,7,9, 10,11,13, 14, 15}, 
{6,7,8, 10, 11, 12, 14, 15, 16}. 
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VI. Simulation Results 

A. Quality of the Free Energy Approximation 

In order to evaluate the quality of our free energy approx- 
imation, we performed Monte-Carlo simulations of several 
channels. Fig. |3j-(a) displays the root mean square (RMS) 
error per symbol, in percentage, between the approximated and 
exact free energies as a function of the channel's size N 2 {N = 
4, . . . , 9, where a 9 x 9 channel is the largest case for which 
exact computation was feasible.) The results were averaged 
over 500 realizations. As can be observed the difference 
between the approximated and exact free energies is minuscule 
(in the order of 10 _4 %). These results were obtained for a 
specific channel, i.e., Wyner's hexagonal cellular network, as 
depicted in Fig [0(b), with a = 0.5 and signal to noise ratio 
(SNR) of OdB. Similar error performance was observed for all 
other channels, throughout the entire interference range and for 
a wide scope of SNR. 

Fig. |3j-(b) presents the CVM approximation of the free 
energy per symbol as a function of the channel's size. The 
results were averaged over 500 channel realizations (for small 
channel size, N < 9, we used the same channel realizations 
as in Fig 0-(a).) It can be observed that the free energy per 
symbol converges with the size of the system, and that the 
differences among realizations become smaller. In principle, 
we could have simulated even larger systems, for which these 
differences would have been smaller. However, it seems that a 
30 x 30 system size suffices as an approximation of the exact 
free energy per symbol of infinite systems, thus can provide 
a proper estimate of the information rate. 

B. Information Rate Computation 

The proposed GBP-based algorithm is used for estimating 
the SIR of two examples of dispersive 2-D channels: a 2-D 
ISI channel and an hexagonal Wyner cellular network. The 
results were obtained by averaging over 1000 realizations of 
30 x 30 channels. The standard deviation of the results were 
small, thus are omitted from the figures. 

a) 2-D ISI Channel: We compute the SIR of a binary 
ISI channel with non-trivial (a = 0.5) interference structure 
as depicted in Fig.[0-(a). Fig. |4] presents the SIR, in terms of 
bit per symbol, computed using the GBP-based algorithm, as a 
function of SNR. Also drawn are the lower and upper bounds 
on the SIR, recently suggested by Chen and Siegel [10]. As 
can be seen the evaluated SIR agrees with these tight bounds. 

b) 2-D Wyner Cellular Network: In a similar way, we 
computed the SIR of an hexagonal topology Wyner model [1], 
under binary signaling, with a single user within each cell 
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Fig. 3. (a) Root mean square (RMS) error (in %) in computing the free 
energy per symbol, exactly and using the CVM, for N X N channels. The 
results were obtained using 500 realizations of Wyner's hexagonal cellular 
networks (assuming a single user per cell), with a = 0.5 and SNR=0dB. 
(b) The corresponding CVM free energy per symbol as a function of N. For 
N < 9 we used the same realizations as in (a). For larger systems (dashed 
line) the exact free energy can not be calculated. 
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Fig. 4. A 2-D ISI channel: SIR, in terms of bit per symbol, evaluated using 
GBP-based simulations (squares and a solid line), as a function of SNR. Also 
shown are upper (UB, dashed-dotted) and lower (LB, dashed) bounds on the 
SIR [10]. 



(i.e. K = 1 in Wyner's notation). Fig. [5] displays the SIR 
calculated for the possible range of inter-cell interference 
scaling a, for three SNR levels. For comparison we also 
present the Gaussian signaling capacity, as derived by Wyner. 
As may be expected for low SNR (-10dB) the SIR and 
Wyner's capacity almost coincide. For the intermediate SNR 
level (OdB) Wyner's capacity provides a tight upper bound on 
the SIR for a < 0.5. Note, in passing, that since the capacity 
of a binary channel is bounded between the SIR and Wyner's 
Gaussian capacity, one can also infer the capacity in these 
low and intermediate SNR regimes. As for high SNR (8dB) 
the SIR saturates the 1-bit bound, for almost all values of a. 




Fig. 5. Hexagonal Wyner network: SIR (squares and solid line) and Gaussian 
signaling capacity (dashed), in bits per channel use, as a function of a for three 
SNR values: —10, and 8 dB, and a single user within each cell (K = 1). 
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Fig. 6. Hexagonal Wyner network: SIR (squares and solid line) and capacity 
(dashed), in bits per channel use, as a function of SNR for a = 0.5 and 
K = 1. 



In Fig. [6] we evaluated the SIR for a fixed a = 0.5 as a 
function of SNR. The SIR coincides with Wyner's capacity 
for SNR < 4dB. 

It should be emphasized that the analysis could have been 
performed, in a similar manner, for the case of several intra- 
cell users, i.e. K > 1. This case corresponds to the case 
of K = 1, where (K + l)-ary signaling from a binomial 
distribution, replaces the equiprobable binary signaling. 

VII. Discussion 

In this paper we introduced a method for a simulation- 
based computation of the information rates of 2-D finite-state 
input channels with memory. Our method is established upon 
a connection between the information rate and the free energy, 
and on a graphical models based method of approximating this 
free energy. The quality of the approximation was compared 



to the exact free energy using small channels, and was found 
to exhibit practically accurate behavior, being consistent both 
as a function of the SNR and over the possible interference 
range. This behavior is then conjectured to hold for large, yet 
finite, systems, for which the information rate was estimated. 
In order to validate our method we compared the resulting 
information rate to formerly calculated bounds. 

The physics and graphical models literature does not pro- 
vide a rigorous explanation for this remarkable quality of 
approximation, as provided by the GBP-based CVM, thus 
research in this direction is currently underway. 
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